Resource Library: Multimedia

home *** CD-ROM | disk | FTP | other *** search

/ Resource Library: Multimedia / Resource Library: Multimedia.iso / hypertxt / msdos / hypdiss / chapt4 < prev next >

Wrap

Text File | 1992-04-12 | 62.7 KB | 1,106 lines

CHAPTER IV RESULTS AND DISCUSSION OF FINDINGS Purpose This case study addressed the information retrieval design potential of hypertext systems. It studied the implementation of desirable information retrieval features within a working hypertext authoring system. The study focused on hypertext implementation of traditional print-oriented information retrieval methods. The investigator wanted to determine the extent to which these traditional methods could be achieved, emulated, or otherwise incorporated within a commercial, end-user, implementation of a hypertext system. Results of the Case Study The investigator developed the previously mentioned Information Access Model (IAM) <app-a>, as a conceptual representation of the approaches and features used in traditional information retrieval systems. He refined the IAM outline to produce an interview schedule <app-b>. The investigator interviewed Neil Larson and Tony Phillips, the two principals involved in the production and application of the subject hypertext system, during March 11-15, 1991. The interviews were at the separate hypertext developer and publisher sites in Berkeley and Kentfield, California. As described in Chapter III, the investigator provided the subjects with copies of the schedule prior to the interview session. He also reviewed the contents with them prior to the interview, to insure understanding of the terminology and question intent. The sessions were recorded in note form and also tape-recorded. The investigator later reviewed the notes and tape recordings in detail to produce detailed interview summaries (See Appendixes D <app-d> and E <app-e>). The summaries were then used to prepare summary tables and graphs presenting results of the IAM schedule responses. As the actual software developer, Larson was more familiar with the technical abilities of the system. Phillips is the author/producer of the large hypertext system described previously. His focus on the pragmatic editorial approach occasionally led him to categorize various schedule items as being achievable or as a matter of editorial option, when the functionality was actually already present. The investigator was able to determine this, since he had extensive conversation with both individuals. Accordingly, the table summaries of system abilities are based more on Larson's interview schedule. Yet, the interview summaries record both individual's responses to the items. However, the summaries provided only a base for the final tabular ratings. The investigator first compared the responses of the two interviewees in detail. He then balanced the interview findings with a number of supporting external information sources. These included: detailed examination of system documentation and the MaxThink newsletter; direct observation of the DaTa CD-ROM production operation; informal interviews with several MaxThink users; direct hands-on use of the software; and follow-up telephone conversations with the principals. The results of the ratings are presented below. The discussion is grouped according to the major divisions of the IAM outline: A) Archive & Transaction Support System; B) Information Access System; and C) Control Mechanisms. Section A of the IAM schedule will be summarized in text discussion. Discussion of results in sections B and C of the schedule will begin with the respective summary tables of findings, then continue with discussion of the individual items. Each part will conclude with a short summary, using graphs to portray the grouping of results. Archive and Transaction Support System The investigator began with a conceptual model of traditional information retrieval systems. This model was mentioned earlier in this paper, as the Information Access Model (IAM). The complete IAM is found in Appendix A <app-a>. The investigator reviewed major writers on the topic, from those identified in Chapter II. For the purposes of constructing the IAM, he relied most on Borko and Bernier (1978) <refs -borko>, Cleveland and Cleveland (1990) <refs -cleveland>, Foskett (1982) <refs -foskett>, Meadow (1973) <refs -meadow>, Milstead (1984) <refs -milstead>, Taylor (1986) <refs -taylor>, and Vickery (1973) <refs -vickery>. Background of the Model The IAM <app-a> was a concise definition or portrayal of a generalized information retrieval system. It was intended as a global listing of information retrieval features. The IAM document served partially to communicate the information retrieval concepts to the subject interviewees and partially to focus the study on assessment of relevant information access features. There were obvious similarities in portrayals of information systems by the above-listed writers. The systems were generally described as having the purpose of allowing users to effectively and efficiently access the contents of documents or records covered by the particular system. As Foskett writes <refs -foskett>: . . . the problem that we have to face is that of ensuring that individuals who need information can obtain it with the minimum of cost (both in time and money), and without being overwhelmed by large amounts of irrelevant matter. (1982, 1) This reflects the general pragmatic approach of these writers, who appear to view these systems as practical tools, rather than abstract or theoretical representations of knowledge. Cleveland and Cleveland <refs -cleveland> summarize the functions of an information system as follows: For an information system to carry out the information process, at least six distinct functions are required: (1) the acquisition of the necessary and appropriate documents, (2) the preparation and representation of the content of these documents, (3) the coding of the content indicators for ease of manipulation, (4) the organized storage of those documents and their indicators in separate files, (5) the development of operational search strategies, and (6) the physical dissemination of the retrieval results. At the center of this system is the procedure that identifies and represents the content of the collection to the user; in most cases this is an index (1990, 38). Traditional, paper-based information retrieval systems have generally used some form of the index approach for access to documents. This same approach has carried over to the present generation of automated information retrieval systems. For example, Vickery <<refs -vickery> notes that the general point of entry into an information system is a "list of words," whether it be an index, table of contents, or linked to a classification code (1973, 87). He also describes three main approaches to creating the representation of document content, in the index file. These are: (1) simple extraction of terms from the source document; (2) selective extraction of terms, guided by frequency of usage or significance in the source document; and (3) assignment of pre-existing keys, or a defined indexing vocabulary. Both Vickery (1973) <refs -vickery> and Meadow (1973) <refs -meadow> note the importance of a standardized index language, or controlled vocabulary, to ensure consistent usage by different indexers, as well as between indexers and index users. Problems of consistency arise as index languages become larger and more complex. Index language use becomes more difficult for indexers, resulting in lowered productivity, index quality problems, and higher costs of indexing. The larger language can also be much more difficult for the index user to comprehend and use (Meadow 1973). After digesting major writers addressing information retrieval system description, the investigator developed generalized flowchart representations illustrating working information retrieval systems. These flowcharts were used in creating the IAM and in describing terminology to the interviewees. The general level flowchart is included as Figure 1. >>>> FIGURE 1 GOES HERE This simplified representation of information system processing parallels many of the functions described above by Cleveland and Cleveland. The entry block at the top represents document selection and acquisition. This is followed by the document analysis operation. The next operation involves document concept identification and representation of the document by keys or descriptors. Flowchart blocks on either side of this "key creation" process represent application of the indexing or classification approaches, and the editorial and quality control mechanisms selected for the particular information retrieval system implementation. The next operation involves final processing and production of the information system. The terminal operation represents the completed, distributed information system. Figure 2, depicting the same general process, is taken from Vickery (1973, 88) <refs -vickery>. This flow diagram emphasizes the duality of the retrieval language used by both the indexer and user. Vickery shows that both parties must use a common retrieval language in order to produce compatible index record and query forms. The system performance will reflect the success by both parties in the appropriate use of this common language. >>>> FIGURE 2 GOES HERE Going one step further, Figure 3 summarizes the design components of the standardized indexing or classification approach. This is the index language which is applied in the general information system processing discussed above, and illustrated in Figure 1. This flowchart shows that the standardized index or classification approach results from a combination of decisions. These include design decisions regarding: choice of access points; of available information access methodologies or operations (e.g. index use, full text searching, hierarchical or taxonomic approaches); and finally, the design of editorial and quality control procedures. >>>> FIGURE 3 GOES HERE System Study Results The investigator extensively analyzed the production and implementation of the previously described DaTa hypertext system for accounting and auditing domain information. This is the main information system publication produced using the subject MaxThink authoring system. He studied the DaTa product in detail during the site visit, interviewed the principals extensively, and examined the authoring system software and documentation in detail. He also obtained several issues of the DaTa CD-ROM hypertext for later examination. Figure 4 is a flowchart of the general production process of the subject information system. The production process is described in detail in the interview summaries, Appendixes D <app-d> and E <app-e>, particularly in the Phillips interview summary. >>>> FIGURE 4 GOES HERE The interviewer saw many parallels between the DaTa production operation, and the generalized information retrieval system workflow model. The DaTa operation begins with acquisition of source text (mostly in hardcopy form). The first document processing step is thus to use optical character recognition (OCR) scanning for conversion of hard copy to machine-readable format. These first operations are comparable to standard document acquisition. This is followed by the editorial operation of splitting text into smaller, single topic, text nodes, and reformatting it for best display screen presentation. This corresponds to the document processing step in Figure 1. The third step is the major intellectual operation of adding hypertext organization and embedding links into the document text. This involves several operations: updating of the network hierarchies, inserting of the hypertext links into document text, and generation of the Keyword Out of Context (KWOC) index. This part of the operation clearly parallels the document representation and creation of descriptors/keys in traditional information retrieval system processing. The final steps of DaTa production involve the update processing, manufacturing, and distribution steps of the operation. These steps are identical to the functions of the last two steps of the generalized information retrieval system workflow depicted in Figure 1. The investigator has worked in and managed major unit record files (newspaper clipping files), as well as newspaper index and text database operations. It was clear to him that the workflow of the DaTa hypertext system was a sophisticated information retrieval system production operation. Although untrained in traditional information retrieval system methods, the system designers nevertheless arrived at pragmatic equivalents to many standard techniques. The practicality of the information retrieval design orientation is illustrated by Larson's description of his system design goals, presented during the interviews. He summarized these as: 1. Emphasis on ease of use . . . the simplest, easiest, most intuitive, possible user interface. In Larson's words, "So you never have to think about it." 2. Designing for the "lowest common denominator" hardware platform. The DaTa system runs on any IBM- compatible hardware, from the earliest Intel 8088 chip IBM- PCs to the current Intel 80486 chip units. It runs on any version of MS-DOS starting with Version 2.1. Random Access Memory (RAM) requirement is a modest 512K. The system will work with either monochrome or color monitors. 3. Providing a sophisticated domain area matrix (hierarchical network taxonomy), with a great many highly redundant and highly cross-referenced information approach trails. 4. Providing overlapping, complementary, information access methods, which is characterized as - a. Taxonomic approach, using hierarchical networks; b. Linguistic approach, using online KWOC index and glossary. c. Associative network approach, using embedded hypertext links. Information Access System This section <app-b 2 19> of the IAM covered the features within the hypertext system which supported information access. The evaluation and tabular recording of IAM responses involved detailed analysis of the interview summaries, balanced against corroborating evidence. This evaluation process was explained in detail earlier in this chapter, in the "Results of the Case Study" section. "Section B.1." Access Points These items <app-b 2 21> addressed information system mechanisms for providing access by different "avenues of approach." The access points included are representative of those used in various traditional information systems. Access Point Item Responses Table 1 lists the results of the responses to the Access Points items. Eleven of the fourteen were rated as being present in the subject hypertext system. Three of the items were rated as easily achievable through editorial decision or use of external software. <TABLE1> Item B.1.a., the Main File Sequence, referred to the suitability of the basic organizational sequence as an access point. Traditional information systems implementations allowing this include document classification systems, alphabetical filing systems, and sequentially numbered or coded transaction files. The MaxThink system uses ASCII disk files, with alphanumeric DOS file names. These document files may optionally be further broken down for storage in named hard disk subdirectories. Such a storage approach lends itself to the formation of topical subdirectories and coded or standard file naming approaches. This item was rated Present. Item B.1.b. covered retrieval by author. This was rated present, since many MaxThink system retrieval features may be used for author access. These features include hierarchical taxonomies, online indexes, subdirectory organization, and other approaches. Item B.1.c. covered title access. This is occasionally provided in the DaTa CD product. Title representation is achievable upon editorial decision. This item was rated as present. Items B.1.d. through B.1.d.ii, including name forms, personal and corporate names, may all be optionally provided by editorial decision. They were therefore rated as present. Item B.1.e. referred to keyword retrieval. This is available using several approaches, primarily the "Glossary" (TM) KWOC index. This is an online index covering file descriptive header text lines as well as the text of the added taxonomical descriptions. The Glossary (TM) index excludes stopwords. MaxThink hypertexts also provide gateway interface to "SEARCHWORD," a string- searching program module, and to "CD-INDEX," a full text index access module. Hypertext system links can also transparently execute or "call up" other string-searching or text database external programs. Keyword access was therefore rated as present. B.1.f. addressed subject, topical, or concept access. This is provided by several approaches, including hierarchical taxonomy, hypertext network interconnections, hypertext associative links, and the KWOC index. This item was rated as present. Items B.1.g. through B.1.j. dealt with geographic, date or chronological, [foreign] language, and document format access points. These may all be achieved or provided as a matter of editorial decision. The system indexing mechanisms are provided by the various retrieval functions or approaches within the system, as described above. The items were rated as present. They may optionally be provided by using interface to external software. Item B.1.k., access to document position or location, can optionally be provided by editorial decision. The principals felt it would be labor-intensive and impractical to do this using the hierarchy or hypertext links. They recommended this be accomplished by interfacing to an external searching program with this ability. The item was rated as easily achievable. Item B.1.l. [the last character is the letter L] covered retrieval via automated search of data in specified field locations. Boolean searching and field specification database features are not present in the subject hypertext system. The system can, however, provide such access by interface execution of a program with these capabilities. For example, this investigator has built hypertext systems with the subject software package, using link execution of external text database programs. These programs included Zyindex (TM), BiB/SEARCH (TM), and Nutshell Plus (TM). This feature was therefore rated as easily achievable. Access Points Summary All traditional access points can be implemented with the subject hypertext system, or accomplished by interface to external programs. This was established during the interviews, and verified by direct examination of the subject system authoring software and the DaTa CD-ROM hypertext application. Figures 5 and 6 graphically illustrate the proportional placement of the responses. Figure 5 shows that 78.6% of the access points were present in the subject application. The remaining 21.4% fell into the easily achieved category. Figure 6 presents more detailed information in bar graph form. Eleven of the access point items were present, three were easily achievable, and no items were categorized as not possible or practical. >>>> FIGURES 5 AND 6 GO HERE "Section B.2." Access Approaches The items in Section B.2. <app-b 6 6> of the schedule addressed the information system access approaches or devices, or methods provided by the hypertext system. The I.A.M. model features represented by these schedule items were based on traditional information retrieval approaches. Access Approaches/Systems Responses Table 2 gives the results of the access approaches section responses. Seven of the nineteen items were rated as being present in the subject hypertext system. Eleven of the items were rated as easily achievable through editorial decision or use of external software. Only one of the fourteen items was judged as not possible or practical. <TABLE2> The first part of the access approaches section, B.2.a., covered the general classification scheme approaches. Item B.2.a.i. dealt with hierarchical taxonomy ability. Hierarchical knowledge representations, or taxonomies, are regarded as one of the easiest to use, yet most effective, approaches to information retrieval. The obviousness of well-designed choices which are presented at each level of even a complex hierarchical structure means that even novices are able to use them effectively (Meadow 1973) <refs -meadow>. Glynn and Di Vesta (1977) <refs -glynn> have shown that the use of a hierarchical or outline structure measurably aided subjects in better comprehension of a knowledge domain and its component relationships. They reported that the logical and coherent approach of hierarchical learning and retrieval aids also helped subjects perform better in recalling and inferring specific facts. The basic access design system for the MaxThink system hypertexts grows from the efficient use of taxonomic structures and complex networking interlinking. The Houdini (1987a) <refs -houdini> "three-dimensional outliner" allows easy interconnection of separate hierarchies into complex matrix networks. This matrix outliner has the ability to quickly link any network node or ASCII filename reference to any other point in the network. It can also link between separate networks. Therefore, the author is not limited to one inflexible hierarchy. Instead, the author can use rich interconnection across multiple hierarchical taxonomies. The author can place a single item into many appropriate retrieval hierarchy paths, in a manner similar to filing under multiple entries in a card catalog. When advisable, the author may also interconnect entire hierarchical levels within and across networks. For example, there might be an interconnection from a subtopic of the "Pet Care" hierarchy, across to the "Veterinary Medicine" network, so it may also function as a subtopic of an "Immunization Research" topic. The MaxThink (1987b) <refs 16 4> basic outliner and Houdini matrix outliner programs both support the creation of these intertwined networks. This ability to add complex multiple dimensions to hierarchies or outlines retains the basic ease of use of an outline structure, yet adds representational and retrieval power far above that of simple or "flat" hierarchical structures (Danielsen 1989) <refs -danielsen>. Besides allowing creation of clear, understandable hierarchical knowledge structures, for the user interface, MaxThink's outliner and matix outliner tools also add great production efficiencies to the authoring process. The MaxThink tools enable a two-phase approach to hypertext linking. They thus eliminate the necessity for an author to deal with the enormous number of possible links within document texts. The process can now be divided into two more manageable operational steps: 1) "Macro linking" - Document positioning in the global domain matrix. This is handled with the hierarchical or matrix outliners, perhaps the only efficient tools for this work. This task involves the major positioning of the document or entity in the correct position/document cluster of the hierarchical networks. This step is comparable to classifying a book into the correct subject location in the Dewey or Library of Congress classifications. The Houdini maxtrix outliner also allows multiple hierarchical paths or access trails leading to the same document or document cluster. 2) "Micro linking" - Placement of associative links within document texts or images. This task consists of adding the embedded links or jumps to related and relevant items. This may be done using one of the many specialized authoring tools. This is now a relatively quick and easy task, since the author does not have to deal with the global universe of linking possibilities. The hierarchy placement has already placed the document in the proper position in the domain conceptual matrix. Authors now need only deal with making links within a more limited number of closely related documents in a topical cluster, or to other major network hierarchical nodes. The end result of this workmanlike approach is surprising speed in handling document or hypermedia item insertion into the networks. As illustration, Phillips processes approximately 1000 screens per week, including placement in the domain taxonomy, and embedding of internal associative links. Wayne McPhail, president of Metaphor - The Hypermedia Group, in Hamilton, Ontario, a hypertext and hypermedia production group, is another hypertext author who appreciates the efficiency of the MaxThink approach. He writes <refs -mcphail> that his group originally selected the MaxThink authoring system, primarily because "MaxThink has developed a number of powerful, elegant tools for creating intelligent hierarchical structures . . . which allow users to easily build hierarchical and knowledge matrix systems which can be converted into hyperdocuments . . ." (1991, 461). He continues: In the years following the release of [the first] hyperdocument, I explored a number of hypertext systems including KnowledgePro, Guide, Black Magic, and Matrix Layout. Each had its own appeal, but I found myself returning to MaxThink's products because they allowed me to develop hypertexts quickly and efficiently. (1991, 462) This writer has had similar experience, in producing three hypertext systems with the MaxThink authoring system. The text content of these systems ranged from 300,000- 750,000 characters. For each project, the editorial tasks of designing the document coverage, writing and assembling texts, splitting them into logical nodes, and writing of bridging material, took two to three weeks. In all three cases, the actual building of the hyperdocument network took approximately two hours. This included building the system network, creating hierarchy menu screens, embedding links within the text nodes, and doing final "cosmetic work" on system screens. This is hardly the "labor-intensive" hypertext authoring trap of which many writers have complained or cautioned. The MaxThink hierarchical manipulation tools are both editorially powerful and efficient. Therefore, the B.2.a.i. item dealing with hierarchical taxonomy ability was rated as present. Item B.2.a.ii. dealt with enumerative, universal classifications. This item referred to complex, complete, predefined or fixed, universal classifications, such as the Dewey Decimal or Bliss classifications. The subject principals felt adoption of such a classification to be an editorial option. Hypertext system linkages can be used for effective representation of any desired classification scheme. The developers noted that the import features of the MaxThink outliners would make it quite efficient to import ASCII files carrying the information for either a flat or a hierarchical classification scheme. This ease of importing classification information sidesteps a major obstacle in other systems. Transfer of existing taxonomies has been a major problem in previous efforts (Björklund 1990b) <refs 3 4>. This item was rated as being easily achievable. Items B.2.a.iii and B.2.a.iv concerned literary warrant classifications and faceted classifications. As with the preceding item, the principals felt that these classifications could be easily represented with a hypertext taxonomy. Again, an ASCII format base file structure could easily be transported into the MaxThink system representation. These items were rated as easily achievable. Major category B.2.b. covered the general indexing types. The first of these, Item B.2.b.i. was the alphabetical index. This was implemented in the form of the alphabetized substantive element listing for the Glossary (TM) KWOC index. Both interviewees agreed it would be a simple operation to represent standard alphabetical indexes, and to embed hypertext links to the source document texts. They also agreed that it would be more efficient to use external indexing software to create such index listings, than to attempt manual indexing. They advised the use of specialized indexing packages. This item was rated as easily achievable. Item B.2.b.i.A. referred to the selection or assignment of keywords for classification or indexing purposes. The MaxThink system provides authoring utilities for simple term extraction from source documents. They plan to develop more sophisticated term extraction utilities, and have initially examined material covering such approaches (Pao 1978 <refs -pao>; Tenopir 1990 <refs -tenopir>). The Glossary (TM) KWOC system effectively accomplishes simple term extraction from titles and taxonomy content descriptions, using the stylized KWOC format. This item was rated as present. Items B.2.b.i.B. and B.2.b.i.C. covered the use of controlled vocabulary term assignment and relative indexing methods. The principals felt these to be a matter of editorial decision. They stated that the hypertext associative linking could easily represent such index approaches. They advised use of external software for efficient maintenance of such vocabularies or indexes. These items were rated as easily achievable. Item B.2.b.ii. covered the general category of term manipulation indexes. This category was rated as present, since a KWOC index is provided. Item B.2.b.ii.A. referred to simple permuted or rotated indexes, often found in the general form of Keyword in Context (KWIC) indexing. The MaxThink production implementation does not presently include this type of index, since the designers preferred the KWOC format as easier for users. The principals agreed that this type of index could effectively be represented in a hypertext representation. Again, they recommended use of an external program to produce a KWIC index. The item was rated as easily achievable. Item B.2.b.ii.B. represented term manipulation indexes, ordered by an extracted term element. This refers to the Keyword Out of Context (KWOC) index type, where the unrotated term context lines are sorted by the substantive or index term, rather than using the rotated line form . KWOC indexing is an integral part of the current MaxThink production implementation. <glossary.txt example of a MaxThink KWOC index> KWOC index production is accomplished by a MaxThink utility program. This item was therefore rated as present. The next group of term manipulation category items were B.2.b.ii.C. and B.2.b.ii.D. The first item covered string indexing, using algorithmic phrase or term relationship manipulation. Well-known examples of this category include PRECIS and CIFT indexing, respectively developed for the British National Bibliography and the Modern Language Association. The second item refers to chain indexing. In this manipulated index form, the constructed index string form reflects the basic embedded taxonomy or hierarchy. Cleveland & Cleveland (1990) <refs -cleveland> extensively discuss both of these index types. The interviewees agreed that both of these index types could be represented in the hypertext presentation. They recommended use of external software to create and manage these forms of indexes. Both items were therefore rated as easily achievable. The next item, B.2.b.iii., covered the classified index form. This index type is arranged in the alphanumeric order of a selected, classification code. The principals agreed that this was an editorial option, which could be successfully implemented in the hypertext format. They again noted the ease of import of an ASCII file of a classified index table, using the MaxThink outliner software. This would mean efficient import and translation into a manipulable hypertext taxonomy format. The item was graded as easily achievable. Item B.2.b.iv. covered the category of coordinate indexing. This referred retrieval using assigned descriptor or index terms, using simple or combined term queries. The CD-INDEX utility of the MaxThink authoring system (output illustrated in Appendix F) produces searchable full text indexes. The search component of the full text searching module delivers simple coordinate retrieval functionality. This item was rated as present. Item B.2.b.iv.A. represented the category of older, non-automated, coordinate searching methods. Some examples include edge-notched cards, "peekaboo" punched-hole card coordinated systems, and terminal digit coordination. These manual methods were judged as inappropriate for an automated implementation. The item was therefore rated as neither achievable nor applicable. Item B.2.b.iv.B. represented the database searching approach to coordinate retrieval. Although the MaxThink systems have string-searching and full text searching modules, they do not possess database features. These capabilities would include ability for field searching specification, searching for field value presence or absence, or for combinations of text and field values. The interviewees agreed that this ability could be added by editorial decision, using link execution of an appropriate external program. The item was rated as easily achievable. Item B.2.b.iv.C. covered full-text searching ability. As mentioned, the MaxThink systems offer simple string- searching and full text indexed retrieval program modules. These modules do not have sophisticated text retrieval abilities. Software developer Larson is subjectively opposed to dependence upon full text searching techniques, pointing to the many studies which demonstrate poor or uneven retrieval performance of the approach (Blair and Maron 1985 <refs -blair>; 1990 <refs 3 18>). He is therefore emphatically committed to the taxonomic and associative linking approaches. However, he has responded to hypertext author and end user demands by producing the text searching modules mentioned above. The MaxThink software automatically generates lists of hypertext links in response to user- specified terms. This approach is generally described as "dynamic linking" (Frisse 1988) <refs -frisse>. Larson's interface predictably uses hypertext link calls to execute the search modules. The searches generate lists of links to both text file nodes and hierarchy entries containing the desired terms. Use of these dynamically-generated link lists combines the specificity of text searching, and also retains the guidance value of existing embedded hypertext links. The user can use the generated list to make hypertext jumps to found items, and can also use the links within those items to continue his or her search. Larson's decision to also generate links into the hierarchy entries means that his text searching module gently guides the user back into the sophisticated taxonomy approach. This gives the user the benefit of both text searching brute force and the structured taxonomy. The interviewees both agree that their hypertext system developers or end users have the additional option of using link execution of more sophisticated external text searching programs. The item was rated as present. Appendix F <app-f> contains multiple screen print illustrations demonstrating the text-searching module. ***> NOT INCLUDED IN THIS HYPERTEXT VERSION <*** Item B.2.b.v. referred to the faceted indexing approach. The interviewees concur that provision of this style of access is an editorial option. They note that it would be most effective to use external software to create a faceted indexing file, and then import it into the MaxThink system for translation into the hypertext format. This category was therefore rated as easily achievable. Item B.2.b.vi. refers to the citation indexing approach. Both interviewees agree that a citation index file could be created using a separate external program, and imported into the MaxThink system for translation into the hypertext format. The category was rated as easily achievable. Access Approaches/Systems Summary The responses to this section of the interview, showed that all but approximately 5% of the traditional access approaches or systems can be implemented with or through the subject hypertext system. Figures 7 and 8 graphically illustrate the proportional placement of responses. Figure 7 shows that 36.8% of the approaches were present in the subject system. Many more of the items, some 57.9%, were in the easily achieved category. Only 5.3% of the approaches were rated in the not possible or practical group. Figure 8 presents this information in bar graph form. Seven of the items were present, eleven were easily achievable, and one item was categorized as not possible or practical. >>>> FIGURES 7 AND 8 GO HERE Only about one-third of the items fell into the present or implemented category, while more than half were rated as easily achieved. This contrasted with the first section of the schedule, where approximately two-thirds of the items were rated as being present. This pattern switch was in great part due to MaxThink developer editorial decision. Many access approaches were potentially possible, but unimplemented. This was due in large part to the principals' emphasis on the creation of taxonomic networks as the main tools for access. The interviewees frequently expressed their editorial view that many traditional information retrieval approaches are difficult to understand and use, and therefore tend to be inappropriate for novices or infrequent users. They emphasized that they have deliberately designed the DaTa hypertext to serve this class of user. "Section B.3." Control Mechanisms Section B.3. <app-b 11 11> of the schedule referred to devices provided for the purpose of editorial and quality control of an information system. Such devices or mechanisms offer control of such areas as taxonomy, vocabulary consistency, entry format, syntax, and item filing sequence. Control Mechanisms Results Table 3 lists the results of the Control Mechanisms section of the study. Out of fifteen items, nine were rated as present, five as easily achievable, and one as not possible or practical. <TABLE3> Item B.3.a. referred to the use of a classification schedule as the basis for the organization of the information system. Most examples of this approach utilize a fixed or published classification hierarchy. It was the opinion of the MaxThink principals that the flexible and adaptive taxonomy networks of their system are equivalent to a dynamic classification system. Phillips, the actual author of the DaTa product, in particular, habitually referred to his working representation of the accounting and auditing subject domain area as a "global matrix" or a "conceptual matrix." He used the Houdini matrix outliner tool to efficiently maintain the maze of interconnected or networked area hierarchies. At the time of interview, the current global network consisted of approximately two hundred interconnected networks. This item was rated as present. Item B.3.b. addressed traditional approaches to the maintenance and application of controlled subject vocabularies. The interviewees stated that vocabulary control was an editorial option, and agreed that it was basically necessary for information retrieval system quality control. They referred to several of their own basic applications of the concept. They use syntax and plural policies in building their KWOC index; they are developing synonym and thesaurus control utilities; they use stopword lists for the KWOC index. The item was rated as present. Item B.3.b.i. specifically addressed the maintenance of simple authority or headings files. Both principals agreed that this was an editorial decision, for optional inclusion. The MaxThink system does not presently include any kind of authority maintenance utility. They felt this function could be achieved using either manual or external software means. The item was rated as easily achievable. Item B.3.b.ii. referred to thesaurus maintenance. This was intended to identify a more sophisticated concept control approach than a simple authority list. The thesaurus authority file generally shows the full scope of term coverage, the relationships of broader terms, narrower terms, related items, and guides from synonymous to preferred terms (Cleveland & Cleveland 1990) <refs -cleveland>. The interviewees do not presently maintain full thesaurus control, but felt it was a desirable editorial option. They felt this function could be performed externally by using either manual or automated thesaurus maintenance. At time of writing, Larson had informed the investigator that MaxThink is currently developing a thesaurus maintenance program (Neil Larson, telephone interview, August 9, 1991). The item was rated as easily achievable. Item B.3.b.iii. covered the use of derived term methods. This refers to terms extracted directly from source document text, using either manual or automated processing approaches. Both agreed that this was an editorial option. DaTa author Phillips felt that a domain expert would not need such methods to extract document concepts; Larson felt that third party or external utility software could be useful. However, he is considering development of term extraction utilities for quick identification of general content. This item was rated as easily achievable. Item B.3.b.iv. referred to use of a hierarchical searching thesaurus. This approach is sometimes used in full text or bibliographic index database systems. It allows a searcher to optionally use hierarchical term relationships to aid in term searching. This approach is not relevant to the hypertext associative linking approach, since it is not a "searching" retrieval approach, and cannot utilize this method. Because of this, the item was rated as not achievable or applicable. However, the interviewees noted that hypertext system authors may use link calls to execute an external database program. They agreed that hypertext system authors could easily provide the searching thesaurus approach by using an external program with this facility, such as Zyindex (TM) or MicroBASIS (TM). Item B.3.b.v. covered the generic approach of controlling term entry form, for vocabulary consistency. The interviewees felt this was an editorial decision, and could be achieved using either manual or automated means. The item was rated as present. Item B.3.b.v.A. referred to control of entry syntax, such as preference of noun or adjectival form, and entry construction approach. This was judged a matter of editorial policy. It may be handled manually or with external program support. At time of interview, the MaxThink DaTa product operation used manual application of entry syntax policy. The item was rated as present. Item B.3.b.v.B. referred to the standardization of entry "number," or consistency in singular or plural usage form. Again, the interviewees judged this a matter of editorial policy decision, which may be handled either manually or with external program support. The MaxThink DaTa product operation presently uses automatic depluralization in the Glossary (TM) KWOC index-building program. The item was rated as present. Item B.3.b.v.C. covered automatic depluralization in database searching. This method allows retrieval of either singular or plural noun forms, in response to entry of either a singular or plural query term. This is not relevant to the hypertext retrieval approach. However, the interviewees noted this could be achieved by linking to external searching software with such capability. The item was therefore rated as easily achievable. Item B.3.b.v.D. addressed the approach of automated synonym definition. The KWOC index-building program module includes automatic synonym-handling, with cross-reference insertion for terms in the main KWOC index listing. The hypertext system can also use links to external searching programs with synonym definition capability. The item was rated as present. Item B.3.c. referred to the use of standardized subdivision or facet identification approach to consistently identify document types. This was judged as a matter of editorial decision. The DaTa production operation uses standard filetype naming conventions and special coding to reflect document types. The KWOC index- building program uses the codes to group document types when sorting the KWOC index entries. This item was rated as present. The approach of term or descriptor relationships was covered in Item B.3.d. This referred to techniques using term roles or links, or to the weighting of terms in database searching. The interviewees observed that this text searching methodology was not relevant to the hypertext associative linking approach. However, it could be achieved by link calls to execute external programs with such abilities. The item was therefore rated as easily achievable. Item B.3.e. covered the use of filing or sorting rules in building an information system. As an automated system, MaxThink presently uses simple ASCII sorting for the KWOC index. They do provide for subsorts by document type or hypertext node type. This is an editorial decision. Other sorting approaches could be incorporated by use of appropriate algorithms. The item was rated as present. Item B.3.f. referred to the use of automated authority and procedural safety or editorial and quality control measures. The MaxThink system uses the Hyperlink (1988) collection of separate utility programs for these purposes (Fersko-Weiss 1991 <refs -fersko>; Perez 1991 <refs -perez>; Urr 1991 <refs 23 8>). The utilities perform such functions as: checking the spread or clustering of associative linking of nodes; checking for blind or erroneous link references, correcting link errors; automatic linking to files containing defined terms or phrases; and importing of ASCII file node names into the matrix outliner (for efficient translation into the conceptual taxonomy). These utilities are fully described in MaxThink system documentation (TransText 1990) <refs -transtext>. The DaTa production operation additionally employws standard computer operating procedures to insure information system security. This includes duplicate working copies, regular backup of working files, off-site storage of copies of CD-ROM masters and tape duplicates, and other such methods. Item B.3.f. was therefore rated as present. Control Mechanisms Summary Once again, the great majority of items, 93.3%, are rated as present or easily achieved. Figures 9 and 10 graphically present these results. In this section of the study, nine of the items (60%) were present; 5 of the items (33.3%) were rated as easily achieved; and one (6.7%) was rated as not possible or practical. >>>> FIGURES 9 AND 10 GO HERE This pattern was similar to the first section, with the majority of items falling into the present or implemented category. The investigator felt that this reflected the principals' ongoing hypertext publishing activity. They are regularly producing a large and complex hypertext product, and have had to devise effective production and editorial control measures. Overall Summary of Study Findings The study found that the great majority (95.8%) of all IAM items were rated as present or easily achieved in the subject system. Twenty seven of the items (56.3%) were present, nineteen (39.6%) were judged as easily achieved, and only two (4.2%) were rated as not possible or practical. Figures 11 and 12 graphically present the results of totalling items from all sections of the study. >>>> FIGURES 11 AND 12 GO HERE The investigator noted that all nineteen of the easily achieved items were presented as possible to implement through the use of external or third party software. He therefore further categorized the purpose or functionality of the required external software, into two categories. The first group included software required to actually perform a retrieval approach or function. There were four of these items. These included Items B.1.j. (access by format of document), B.1.k. (access to internal document location or position), B.1.l. (specified field access), and B.3.d. (term roles, links, or weighting). The second category was software serving the purpose of editorial or authoring process control. This would include such functions as thesaurus maintenance, maintenance of a classification schedule, production of a faceted or string index, etc. The remaining fifteen items fell into this category. The simple listing of these relationships is listed in Table 4. The analysis shows that external software needs of the subject system are primarily in the area of editorial or process controls, rather than in the retrieval mechanism or function area. <TABLE4> The summary bar chart in Figure 13 presents the classification of software needs across the three IAM categories. In the Access Systems (Approaches/Methods) and the Control Mechanisms categories, the external software needs were similarly heavily weighted towards the editorial or process controls area. Only the Access Points category fell outside this pattern, with all three of the external software packages required to perform the missing function. >>>> FIGURE 13 GOES HERE This concludes the Results and Discussions of Findings chapter. The final chapter will present the summary and conclusions of the study, and make recommendations for further research. It will also offer generalizations from this specific system case study that may be applied to the broader hypertext genre.